Evaluating Visual Representations for Topic Understanding and Their Effects on Manually Generated Topic Labels

نویسندگان

  • Alison Smith
  • Tak Yeon Lee
  • Forough Poursabzi-Sangdeh
  • Jordan Boyd-Graber
  • Kevin Seppi
  • Niklas Elmqvist
  • Leah Findlater
چکیده

= {Probabilistic topic models are important tools for indexing, summarizing, and analyzing large document collections by their themes. However, promoting end-user understanding of topics remains an open research problem. We compare labels generated by users given four topic visualization techniquesword lists, word lists with bars, word clouds, and network graphsagainst each other and against automatically generated labels. Our basis of comparison is participant ratings of how well labels describe documents from the topic. Our study has two phases: a labeling phase where participants label visualized topics and a validation phase where different participants select which labels best describe the topics’ documents. Although all visualizations produce similar quality labels, simple visualizations such as word lists allow participants to quickly understand topics, while complex visualizations take longer but expose multi-word expressions that simpler visualizations obscure. Automatic labels lag behind user-created labels, but our dataset of manually labeled topics highlights linguistic patterns (e.g., hypernyms, phrases) that can be used to improve automatic topic labeling algorithms.},

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating Visual Representations for Topic Understanding and Their Effects on Manually Generated Labels

= {Probabilistic topic models are important tools for indexing, summarizing, and analyzing large document collections by their themes. However, promoting end-user understanding of topics remains an open research problem. We compare labels generated by users given four topic visualization techniquesword lists, word lists with bars, word clouds, and network graphsagainst each other and against au...

متن کامل

A review of text mining approaches and their function in discovering and extracting a topic

Background and aim: Four text mining methods are examined and focused on understanding and identifying their properties and limitations in subject discovery. Methodology: The study is an analytical review of the literature of text mining and topic modeling.  Findings: LSA could be used to classify specific and unique topics in documents that address only a single topic. The other three text min...

متن کامل

Evaluating Visual Preferences of Architects and People Toward Housing Facades, Using Multidimensional Scaling Analysis (MDS)

One of the most important issues that have absorbed the public opinion and expert community during the recent years, is the qualitative and quantitative aspects of the housing. There are several challenges related to this topic that includes the contexts of the construction, manufacturing, planning to social aspects, cultural, physical and architectural design. The thing that has a significant ...

متن کامل

Mining Adverse Events of Dietary Supplements from Product Labels by Topic Modeling

The adverse events of the dietary supplements should be subject to scrutiny due to their growing clinical application and consumption among U.S. adults. An effective method for mining and grouping the adverse events of the dietary supplements is to evaluate product labeling for the rapidly increasing number of new products available in the market. In this study, the adverse events information w...

متن کامل

Aletras, Nikolaos, Timothy Baldwin, Jey Han Lau and Mark Stevenson (to appear) Representing Topics Labels for Exploring Digital Libraries, In Proceedings of Digital Libraries 2014, London, UK

Topic models have been shown to be a useful way of representing the content of large document collections, for example via visualisation interfaces (topic browsers). These systems enable users to explore collections by way of latent topics. A standard way to represent a topic is using a set of keywords, i.e. the top-n words with highest marginal probability within the topic. However, alternativ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017